Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: stringybark Date: Apr 19, 2008 2:23pm
Forum: texts Subject: How do Microsoft digitize texts?

Can anyone tell me what Microsoft uses for its book digitization? Mainly, what is the bitmap compression used: on enlargement of a page the fonts and illustrations have a sort of 'watercolour' appearance. I'm just curious to know.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffbrewster Date: May 4, 2008 7:46am
Forum: texts Subject: Re: How do Microsoft digitize texts?


Microsoft digitizes books many different ways. One way is to sponsor digitization done by the Internet Archive.

Those books are photographed in color and then put through a set of processing steps, including Optical Character Recognition, to create searchable PDF's, DJVU, and a flip book.

See the Scribe Bookscanning system for more information.

I hope this answers your questions.

-brewster

Reply to this post
Reply [edit]

Poster: stringybark Date: May 4, 2008 11:39pm
Forum: texts Subject: Re: How do Microsoft (make that Scribe) digitize texts?

Thanks for that lead, Brewster, I found a couple of things (oddly, mentioning a guy named Brewster):

redjar.org/jared/blog/archives/category/preservation/digitization/
aipengineering.com/scribe/
www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2005/11/22/MNGQ0FSCCT1.DTL&hw=brewster+kahle&sn=001&sc=1000
www.linux.com/feature/61054
news.zdnet.com/2100-9588_22-5915690.html
plus a really useless Wikipedia entry.

So its Scribe that does the work, Microsoft pays the money and gets watermarking rights. But the really intriguing thing is the way the image is processed - it seems to re-composite the photographed image as a number of layers, and the compression is definitely not jpeg.

Should be more of this form of digitization - excellent results, definitely allows access to material practically impossible to get hold of. Hence it would be excellent of the technological details are well known. That would allow lots of corridor talk, leading maybe to more players on the Internet Archive side.

Reply to this post
Reply [edit]

Poster: agharbeia Date: Jul 14, 2008 7:05am
Forum: texts Subject: Re: How do Microsoft digitize texts?

Are they still digitising?

I though Microsoft have wound their digitisation initiative down.

This post was modified by agharbeia on 2008-07-14 14:05:43